Sparse Hopfield network reconstruction with l\ regularization 
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We propose an efficient strategy to infer sparse Hopfield network based on the magnetizations 
and pairwise correlations measured through Glauber samplings. This strategy incorporates the i\ 
regularization into the Bethe approximation, and is able to further reduce the inference error of the 
Bethe approximation without the regularization. The optimal regularization parameter is observed 
to be of the order of M _1//2 where M is the number of independent samples. 
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I. INTRODUCTION 



The inverse Ising problem is intensively studied in statistical physics, computational biology and computer science 
in the few past years [lrS|. The biological experiments or numerical simulations usually generate a large amount of 
experimental data, e.g., M independent samples {(T 1 , ct 2 , . . . , <x M } in which er is an iV-dimensional vector with binary 
D ■ components (<jj = ±1) and N is the system size. The least structured model to match the statics of the experimental 
data is the Ising model 



Pisi - (CT) = z(/b) exp 



^ Jij<7i<7j + hiUi 



i<3 



(1) 



where the partition function Z(h, J) depends on the iV-dimensional fields and — —-dimensional couplings. These 



fields and couplings are chosen to yield the same first and second moments (magnetizations and pairwise correlations 
"■^J ' respectively) as those obtained from the experimental data. The inverse temperature j3 = 1/T has been absorbed 
£^ . into the strength of fields and couplings. 

Previous studies of the inverse Ising problem on Hopfield model (6MT0I] lack a systematic analysis for treating sparse 
i^i , networks. Inference of the sparse networks also have important and wide applications in modeling vast amounts of 
biological data. Actually, the real biological network is not densely connected. To reconstruct the sparse network 
fNJ \ from the experimental data, an additional penalty term is necessary to be added into the cost function, as studied 
in recovering the sparse signals in the context of compressed sensing [111, [l2j or in Ising model selection 0, EH . This 
strategy is known as £i-regularization which introduces an £i-norm penalty to the cost function. The £i-regularization 
has been studied in the pseudo-likelihood approximation to the network inference problem [TJ] and in the setting of 
sparse continuous perceptron memorization and generalization (l5j . This technique has also been thoroughly discussed 
in real neural data analysis using selective cluster expansion method [161 ]. The cluster expansion method involves 
repeated solution of the inverse Ising problem and the computation of the cluster entropy included in the expansion 
(cluster means a small subset of spins) . To truncate the expansion, clusters with small entropy in absolute value are 
I ■ discarded and the optimal threshold needs to be determined. Additionally, the cluster size should be small to reduce 
• • the computational cost while at each step a convex optimization of the cost function (see Eq. for the cluster should 
be solved. This may be complicated in some cases. The pseudo-likelihood maximization [14] method also involves a 
careful design of the numerical minimization procedure for the pseudo-likelihood (e.g., Newton descent method). In 
this paper, we provide an alternative way to reconstruct the sparse network by combining the Bethe approximation 
and the ^i-regularization, which is much simpler in practical implementation. We expect that the £i-regularization 
will improve the prediction of the Bethe approximation. To show the efficiency, we apply the method to the sparse 
Hopfield network reconstruction. 

The outline of the paper is as follows. The sparse Hopfield network is defined in Sec. [H] In Sec. IIII1 we present 
the hybrid inference method by using the Bethe approximation and ^i-regularization. We test our algorithm in single 
instances in Sec. IIV1 Concluding remarks are given in Sec. [V] 



II. SPARSE HOPFIELD MODEL 



The Hopfield network has been proposed in Ref. [17[ as an abstraction of biological memory storage and was found to 
be able to store an extensive number of random unbiased patterns [l8j . If the stored patterns are dynamically stable, 



2 



then the network is able to provide associative memory and its equilibrium behavior is described by the following 
Hamiltonian: 

U = - JijViPj (2) 

i<j 

where the Ising variable a indicates the active state of the neuron (<7j = +1) or the silent state (<7j = —1). For the 
sparse network storing P random unbiased binary patterns, the symmetric coupling is constructed as 

•/, '-jil (3) 

where I is the average connectivity of the neuron. In the thermodynamic limit, P scales as P = al where a is the 
memory load. No self-interactions are assumed and the connectivity hj obeys the distribution: 

P(kj) = (l - j^j 5{k 3 ) + J^iWv ~ !)• (4) 

Mean fie ld prop erties of the sparse Hopfield network have been discussed within replica symmetric approximation 
in Refs. [ljAU^I- Three phases (paramagnetic, retrieval and spin glass phases) have been observed in this sparsely 
connected Hopfield network with arbitrary finite I. For large I (e.g., I = 10), the phase diagram resembles closely that 
of extremely diluted case [2l], [22| where the transition line between paramagnetic and retrieval phase is T = 1 for 
a < 1 and that between paramagnetic and spin glass phase T = \fa for a > 1. The spin glass/retrieval transition 
occurs at a = 1. 

To sample the state of the original model Eq. ([2]), we apply the Glauber dynamics rule: 

P(oi -en) = - [1 - (j i taring] (5) 

where hi = Xj^i Jij a j 1S the local field neuron i feels. In practice, we first randomly generate a configuration which 
is then updated by the local dynamics rule Eq. ([5]) in a randomly asynchronous fashion. In this setting, we define 
a Glauber dynamics step as N proposed flips. The Glauber dynamics is run totally 3 x 10 6 steps, among which 
the first 1 x 10 6 steps are run for thermal equilibration and the other 2 x 10 6 steps for computing magnetizations 
and correlations, i.e., rrii = (fj) d ta , CV, = (o»Oj)-j a ta — m i m j where (• • • ) . . denotes the average over the collected 
data. The state of the network is sampled every 20 steps after thermal equilibration, which yields totally M = 100000 
independent samples. The magnetizations and correlations serve as inputs to our following hybrid inference algorithm. 



III. BETHE APPROXIMATION WITH h REGULARIZATION 

The Bethe approximation assumes that the joint probability (Boltzmann distribution, see Eq. (JT])) of the neuron 
activity can be written in terms of single-neuron marginal for each single neuron and two-neuron marginal for each 
pair of adjacent neurons as 

'W'> = n 5 S^n«M (a, 

where (ij) runs over all distinct pairs of neurons. Under this approximation, the free energy (— In Z) can be expressed 
as a function of connected correlations {Cy} (between neighboring neurons) and magnetizations {m^}. The stationary 
point of the free energy with respect to the magnetizations yields the following self-consistent equations: 

m l = tanh \ hi + tanh -1 (%/(mj, m,, (7) 
V je&f / 

where di denotes neighbors of i, tij — tanh Jy and f(x, y, t) — ^—^ — \t{y-xt) Vt ^ V — ■ Using the linear response 
relation to calculate the connected correlations for any pairs of neurons, we obtain the Bethe approximation (BA) to 
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the inverse Ising problem 23, 2 



Jij = — tanh 



2(C- X ) 




1 + - 2m i m j {C- 1 ) ij ) - 4(C~% 



(8) 



where C is the inverse of the connected correlation matrix and = 1 — mf. The couplings have been scaled by the 
inverse temperature f3. Note that the fields can be predicted using Eq. after we get the set of couplings. Hereafter 
we consider only the reconstruction of the coupling vector. In fact, the BA solution of the couplings corresponds to the 
fixed point of the susceptibility propagation [7|, |25| , yet it avoids the iteration steps in susceptibility propagation and 
the possible non-convergence of the iterations. It was also found that the BA yields a good estimate to the underlying 
couplings of the Hopfield network Q- In the following analysis, we assume that the BA is a good approximation to 
the inverse Ising problem on sparse Hopfield network and try to improve the prediction of BA with £i-regularization. 

The cost function to be minimized in the inverse Ising problem can be written as the rescaled negative log-likelihood 
function 



26(: 



S(h,J\m,C) 



1 

M 



dn 



lnZ(h, J) 



h'rn 



1 



tr(JC) 



(9) 



where m 
of matrb 

following equations: 



i)data and °ij 



T i a j)d t ■ ^ T denotes the transpose of the field vector while tr(A) denotes the trace 
of matrix A. The minimization of S(h, J|m, C) in the jV ( J ^ +1 ) -dimensional space of fields and couplings yields the 



in, = (a. 



(10a) 
(10b) 



where the average is taken with respect to the Boltzmann distribution Eq. (jij with the optimal fields and couplings 
(corresponding to the minimum of S). Actually, one can use Bethe approximation to compute the connected correla- 
tion in the right-hand side of Eq. (|10b[) . which leads to the result of Eq. ©. To proceed, we expand the cost function 
around its minimum with respect to the fluctuation of the coupling vector up to the second order as 



S(J) ~ S( Jo) + DS(J Q fj + Ij T D 2 S(Jo)J 



(11) 



where J defines the fluctuation J = J Jq where Jo is the optimal coupling vector. DS(Jo) is the gradient of S 
evaluated at Jo, and D 2 S(Jo) is the Hessian matrix. We have only kept the coupling dependent S for simplicity 
although it is also a function of h,m and C. The first order coefficient vanishes due to Eq. (|T0|) . Note that the 
Hessian matrix is an N(N — l)/2 x N(N — l)/2 symmetric matrix whose dimension is much higher than that of 
the connected correlation matrix. However, to construct the couplings around neuron i, we consider only the neuron 
i-dependent part, i.e., we set / = i in the Hessian matrix Xij,kl = (&i<TjO'kO'i) ~ (°'i cr j) {^k^i) where ij and kl run over 
distinct pairs of neurons. This simplification reduces the computation cost but still keeps the significant contribution 
as proved later in our simulations. Finally we obtain 



S(J) ~ S(Jq) + - ^ Jij{Cjk — CijCki)Jki + A^ \ Jo,ij + Jii 



(12) 



where an £i-norm penalty has been added to promote the selection of sparse network structure [13j, [lg, [27|. A is 
a positive regularization parameter to be optimized to make the inference error as low as possible. The £i-norm 
penalizes small but non-zero couplings and increasing the value of the regularization parameter A makes the inferred 
network sparser. In the following analysis, we assume Jo is provided by the BA solution (a good approximation to 
reconstruct the sparse Hopfield network [7|), then we search for the new solution to minimize the regularized cost 
function Eq. (fT2|) . finally we get the new solution as follows, 



J 



W _ 



0,ij 



A 



k 



sgn(J , lfc )[C% 1 



(13) 
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FIG. 1: (Color online) Improvement of the prediction by l\ -regularized BA on the sparse Hopfield networks. Network size 
N = 100, the memory load a — 0.6 and the mean node degree I = 5. Each data point is the average over five random sparse 
networks. The regularization parameter has been optimized. The inset gives an enlarged view for the high temperature region. 



where sgn(x) = x/\x\ for x ^ and (C l )kj — C, 



Jij = !(J>- ' + J)f) where J*' is also given by Eq. (Q3 



J* 



To ensure the symmetry of the couplings, we construct 
in which i and j are exchanged. The inverse of C l or C° 
takes the computation time of the order 0(N 3 ), much smaller than that of the inverse of a susceptibility matrix X- 
We remark here that minimizing the regularized cost function Eq. (|12p corresponds to finding the optimal deviation J 
which provides a solution to the regularized cost function. We also assume that for small A, the deviation is small as 
well. Similar equation to Eq. (|13[) has been derived in the context of reconstructing a sparse asymmetric, asynchronous 
Ising network [28|. Here we derive the inference equation (Eq. ([T3")) ) for the static reconstruction of a sparse network. 
We will show in the next section the efficiency of this hybrid strategy to improve the prediction of the BA without 
regularization. To evaluate the efficiency, we define the reconstruction error as 



N(N — 1) 



i<j 



(4 - J T°f 



1/2 



(14) 



where J* is the inferred coupling while J\] nc is the true one constructed according to Eq. ([3]). 



IV. RESULTS AND DISCUSSIONS 



We simulate the sparsely connected Hopfield network of size N = 100 at different temperatures. The average 
connectivity for each neuron I = 5 and the memory load a = 0.6. As shown in fig. [U the £i-regularization in Eq. (|13p 
does improve the prediction on the sparse network reconstruction. The improvement is evident in the presence of 
high quality data (e.g., in the high temperature region, see the inset of fig. [J). However, the gap gets smaller as the 
temperature decreases. This may be due to the insufficient samplings [10| of glassy states at the low temperatures. 
The glassy phase is typically characterized by a complex energy landscape exhibiting numerous local minima. We 
also explore the effects of the regularization parameter on the reconstruction, which are reported in fig. [5J With 
increasing A, the inference error first decreases, then reaches a minimal value followed by an increasing trend in the 
range we plot in fig. [2] Interestingly, the optimal value of A yielding the lowest inference error has the order of 

®(\fi) f° r nxe d network size, which is consistent with that found in Rcfs. [I,[l3j]. This implies that the optimal 
regularization parameter guides our inference procedure to a sparse network closest to the original one. We also 
find that the magnitude of this parameter shows less sensitivity to the temperature although the specific optimal 
value becomes slightly different across different instances of the sparse networks in the low temperature region. The 
number of samplings M determines the order of the magnitude, which helps us find the appropriate strength for the 
regularization parameter. In the real application, the true coupling vector is a priori unknown. In this case, the 
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FIG. 2: (Color online) Reconstruction error Aj versus the regularization parameter A at T = 1.4. Inference results on three 
random instances are shown. The inference errors by applying BA without regularization on the three random instances are 
Aj = 0.006108, 0.006049, 0.005981 respectively. 

regularization parameter can be chosen to make the difference between the measured moments and those produced 
by the reconstructed Ising model as small as possible. As confirmed by our numerical simulations on sparse Hopfield 
networks, the regularization term provides an accurate pruning of the network with a lower inference error if the 
regularization parameter is carefully chosen. Therefore the BA plus the l\ -regularization is an efficient strategy for 
the sparse Hopfield network inference. 



We propose an efficient hybrid inference strategy for reconstructing the sparse Hopfield network. This strategy 
combines Bethe approximation and the £i-regularization by expanding the objective function (negative log-likelihood 
function) up to the second order of the coupling fluctuation around its optimal value. The hybrid strategy improves the 
prediction by zeroing couplings which are actually not present in the network. We can control the accuracy by tuning 
the regularization parameters. The magnitude of the optimal values is determined by the number of independent 
samples M . By varying the value of the regularization parameter, we show that the reconstruction error first decreases 
and then increases after the lowest error is reached. We observe this phenomenon in the sparse Hopfield network 
reconstruction, and this behavior may be different in other cases [T6l | . The approximated reconstruction method we 
provide in this paper is expected to be valid in reconstructing other diluted mean field models. 
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